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Abstract 

The paper presents higher dimension consensus (HDC) for large-scale networks. HDC generalizes 
the well-known average-consensus algorithm. It divides the nodes of the large-scale network into anchors 
and sensors. Anchors are nodes whose states are fixed over the HDC iterations, whereas sensors are nodes 
that update their states as a linear combination of the neighboring states. Under appropriate conditions, 
we show that the sensors' states converge to a linear combination of the anchors' states. Through the 
concept of anchors, HDC captures in a unified framework several interesting network tasks, including 
distributed sensor localization, leader-follower, distributed Jacobi to solve linear systems of algebraic 
equations, and, of course, average-consensus. 

In many network applications, it is of interest to learn the weights of the distributed linear algorithm 
so that the sensors converge to a desired state. We term this inverse problem the HDC learning problem. 
We pose learning in HDC as a constrained non-convex optimization problem, which we cast in the 
framework of multi-objective optimization (MPO) and to which we apply Pareto optimality. We prove 
analytically relevant properties of the MOP solutions and of the Pareto front from which we derive 
the solution to learning in HDC. Finally, the paper shows how the MOP approach resolves interesting 
tradeoffs (speed of convergence versus quality of the final state) arising in learning in HDC in resource 
constrained networks. 
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I. Introduction 

This paper provides a unified framework, high-dimensional consensus (HDC), for the analysis and 
design of linear distributed algorithms for large-scale networks-including distributed Jacobi algorithm [1], 
average-consensus [2], [3], [4], [5], [6], [7], distributed sensor localization [8], distributed matrix inversion 
[9], or leader-follower algorithms [10], [11]. These applications arise in many resource constrained large- 
scale networks, e.g., sensor networks, teams of robotic platforms, but also in cyber-physical systems like 
the smart grid in electric power systems. We view these systems as a collection of nodes interacting over 
a sparse communication graph. The nodes have, in general, strict constraints on their communication 
and computation budget so that only local communication and low-order computation is feasible at each 
node. 

Linear distributed algorithms for constrained large-scale networks are iterative in nature; the information 
is fused over the iterations of the algorithm across the sparse network. In our formulation of HDC, we 
view the large-scale network as a graph with edges connecting sparsely a collection of nodes; each 
node is described by a state. The nodes are partitioned in anchors and sensors. Anchors do not update 
their state over the HDC iterations, while the sensors iteratively update their states by a Hnear, possibly 
convex, combination of their neighboring sensors' states. The weights of this linear combination are the 
parameters of the HDC. For example, in sensor localization [8], the state at each node is its current 
position estimate. Anchors may be nodes instrumented with a GPS unit, knowing its precise location 
and the remaining nodes are the sensors that don't know their location and for which HDC iteratively 
updates their state, i.e., their location, in a distributed fashion. The weights of HDC are for this problem 
the barycentric coordinates of the sensors with respect to a group of neighboring nodes, see [8]. 

We consider two main issues in HDC. 

Analysis: Forward Problem Given the HDC weights or parameters and the sparse underlying con- 
nectivity graph determine (i) under what conditions does the HDC converge; (ii) to what state does the 
HDC converge; and (iii) what is the convergence rate. The forward problem establishes the conditions 
for convergence, the convergence rate, and the convergent state of the network. 

Learning: Inverse Problem Given the desired state to which HDC should converge and the sparsity 
graph learn the HDC parameters so that indeed HDC does converge to that state. Due to the sparsity 
constraints, it may not be possible for HDC to converge exactly to the desired state. An interesting tradeoff 
that we pursue is between the speed of convergence and the quality of the limiting HDC converging state, 
given by some measure of the error between the final state and the desired state. Clearly, the learning 
problem is an inverse problem that we will formulate as the minimization of a utihty function under 
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constraints. 

A naive formulation of the learning problem is not feasible. Ours is in terms of a constrained non- 
convex optimization formulation that we solve by casting it in the context of a multi-objective optimization 
problem (MOP), [12]. We apply to this MOP Pareto optimization. To derive the optimal Pareto solution, 
we need to characterize the Pareto front (locus of Pareto optimal solutions.) Although usually it is hard 
to determine the Pareto front and requires extensive iterative procedures, we exploit the structure of our 
problem to prove smoothness, convexity, strict decreasing monotonicity, and differentiability properties 
of the Pareto front. With the help of these properties, we can actually derive an efficient procedure to 
generate Pareto-optimal solutions to the MOP, determine the Pareto front, and find the solution to the 
learning problem. This solution is found by a rather expressive geometric argument. 

A. Organization of the Paper 

We now describe the rest of the paper. Section 11 introduces notation and relevant definitions, whereas 
Section 111 provides the problem formulation. We discuss the forward problem (analysis of HDC) 
in Section IV and the inverse problem (learning in large-scale networks) in Sections V-VII. Finally, 
Section Vin concludes the paper. 

II. Preliminaries 

This section introduces the notation used in the paper and reviews relevant concepts from spectral 
graph theory and multi-objective optimization. 

A. Spectral Graph Theory 

Consider a sensor network with N nodes. We partition this network into K anchors and M sensors, 
such that N = K + M. As discussed in Section I, the anchors are the nodes whose states are fixed, and 
the sensors are the nodes that update their states as a linear combination of the states of their neighboring 
nodes. Let k = {!,..., K} be the set of anchors and let O = {K -|- 1, . . . , N} be the set of sensors. The 
set of all of the nodes is then denoted by 6 = k U 

We model the network by a directed graph, Q = (V, A), where, F = {1, . . . , N}, denotes the set of 
nodes. The interconnections among the nodes are given by the adjacency matrix, A = {aij}, where 



= < 



1, l^j, 

(1) 

0, otherwise, 
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and the notation I <— j implies that node j can send information to node I. The neighborhood, )C{1), at 
node I is 

m = {j I ai, = 1}. (2) 

The classification of nodes into sensors and anchors naturally induces the partitioning of the neighbor- 
hood, /C(i), at each sensor, I, i.e., 

/Cn(/) = /c(0 n n, /c«(Z) = /c(/) n k, (3) 

where lCn{l) and /Ck(0 s^*^ of sensors and the set of anchors in sensor I's neighborhood, 

respectively. 

As a graph can be characterized by its adjacency matrix, to every matrix we can associate a graph. 
For a matrix, T = {vij} G M^^^, we define its associated graph by = (F"*", A"*"), where = 

"5: 



{1, . . . , N} and A""^ = {a^} is given by 



T 



( 

1, vu ^ 0, 

' ' (4) 

0, vij = Q. 



The convergence properties of distributed algorithms depend on spectral properties of associated 
matrices. In the following, we recall the definition of spectral radius. The spectral radius, p(P), of a 
matrix, P G M^^*'^, is defined as 

/3(P) = max|Ai(p)|, (5) 
where Aj(p) denotes the ith eigenvalue of P. We also have 

p(P) = lim ||P«f/^ (6) 
where || • || is any matrix induced norm. 

B. Multi-objective Optimization Problem (MOP): Pareto-Optimality 

In this subsection, we recall facts on multi-objective optimization theory that we will use to develop 
the solutions of the learning problem. We consider the following constrained optimization problem. 
Lst {/jt(y)}fe=i,...,n be real-valued functions, 

fk-.X ^ R, VA; (7) 
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on some topological space, X (in this work, X will always be a finite-dimensional vector space). The 
vector objective function f(y) is 



f(y) = 



/i(y) 



(8) 



/n(y) 

Let {vfj:} be a family of n real-valued functions on X representing the inequality constraints and {wk} 
be a family of n real-valued functions on X representing the equahty constraints. The feasible set of 
solutions, y, is defined as 

3^ = {y G ^ I vj^iy) < V^, Vk, and Wk{y) = Wk, Vfc}, (9) 

where V^, G M. The multi-objective optimization problem (MOP) is given by 

minf(y). (10) 

Note that the inequality constraints, v^(y), and the equality constraints, Wk{y), appear in the set of 
feasible solutions and, thus, are implicit in (10). 

In general, the MOP has non-inferior solutions, i.e., the MOP has a set of solutions none of which is 
inferior to the other. The solutions of the MOP are, thus, categorized as Pareto-optimal [12], defined in 
the following. 

Definition 1 [Pareto optimal solutions] A solution, y*, is said to be a Pareto optimal (or non-inferior) 
solution of a MOP if there exists no other feasible y (i.e., there is no y G y) such that f(y) < f(y*), 
meaning that /jk(y) < fk{y*), VA;, with strict inequality for at least one k. 

The general methods to solve the MOP, for example, include the weighting method, the Lagrangian 
method, and the e-constraint method. These methods can be used to find the Pareto-optimal solutions of 
the MOP. In general, these approaches require extensive iterative procedures to estabhsh Pareto-optimahty 
of a solution, see [12] for details. 

III. Problem Formulation 

Consider a sensor network with N nodes communicating over a network described by a directed 
graph, Q = (V, A). Let G M^^"* be the state associated to the kth anchor, and let G M}^"^ be the 
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state associated to the Zth sensor. We are interested in studying linear iterative algorithms of the form 

Uk{t+1) = Ufc(t) = Ufe(0), t>0,keK, (11) 

Xi(t + i) = piMt)+ Pij^j(i)+ Yl ^ikM^), t>o,i£n, (12) 

where: t is the discrete-time iteration index; and bi/s and pik's are the state updating coefficients. We 
assume that the updating coefficients are constant over the components of the m-dimensional state, x/(t). 
We term distributed linear iterative algorithms of the form (11)-(12) as Higher Dimensional Consensus 
(HDC) algorithms' [11], [10]. 

For the purpose of analysis, we write the HDC (11)-(12) in matrix form. Define 

U(t) = [uf (t), . . . , ulit)f G M^x-, X(t) = [x^+i(t), . . .,^%{t)f G M^x-, (13) 
P = {pij} G M^x^, B = {bik} e M^x^. (14) 

With the above notation, we write (11)-(12) concisely as 



U(t + 1) 




I 




" U(t) " 


X(t+1) 




B P 







^C(i + 1) = TC(t). (16) 

Note that the graph, ^(T), associated to the N x N iteration matrix, T, is a subgraph of the commu- 
nication graph, Q. In other words, the sparsity of T is dictated by the sparsity of the underlying sensor 
network, i.e., a non-zero element, vij, in T implies that node j can send information to node I in the 
original graph Q. In the iteration matrix, T: the M x M lower right submatrix, P, collects the updating 
coefficients of the M sensors with respect to the M sensors; and the lower left submatrix, B, collects 
the updating coefficients of the M sensors with respect to the K anchors. From (15), the matrix form 
of the HDC in (12) is 

X(i+1) = PX(i) + BU(0), t>0. (17) 

In this paper, we study the following two problems that arise in the context of the HDC. 

Analysis: Forward problem Given an iV-node sensor network with a communication graph, Q, the 
matrices B, and P, and the network initial conditions, X(0) and U(0); what are the conditions under 

'As we will show later, the HDC algorithms contain the conventional average consensus algorithms, [3], [4], as a special 
case. The notion of higher dimensions is technical and will be studied later. 
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which the HDC converges? What is the convergence rate of the HDC? If the HDC converges, what is 
the limiting state of the network? 

Learning: Inverse problem Given an A^-node sensor network with a communication graph, Q, and 
an M X K weight matrix, W, learn the matrices P and B in (17) such that the HDC converges to 

lim X(t+1) = WU(0), (18) 

for every U(0) G M^^"*; if multiple solutions exist, we are interested in finding a solution that leads to 
fastest convergence. 

IV. Forward Problem: Higher Dimensional Consensus 

As discussed in Section HI, the HDC algorithm is implemented as (11)-(12), and its matrix represen- 
tation is given by (16). We divide the study of the HDC in the following two categories. 

(A) No anchors: B = 

(B) Anchors: B / 

We analyze these two cases separately. In addition, we also provide, briefly, practical applications where 
each of them is relevant. 

A. No anchors: B = 

In this case, the HDC reduces to 

X{t+1) = PX(t), 

= P*+^X(0). (19) 

An important problem covered by this case is average-consensus. As well known, if 

p(P) = 1, (20) 

with 1^ and 1 being the left and right eigenvectors of P, respectively, then we have 

lim P*+^ = — , (21) 

under some minimal network connectivity assumptions, where 1 is the M x 1 column vector of I's 
and M is the number of sensors. The sensors converge to the average of the initial sensors' states. The 
convergence rate is dictated by the second largest (in magnitude) eigenvalue of the matrix P. For more 
precise and general statements in this regard, see for instance, [3], [4]. Average-consensus, thus, is a 
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special case of the HDC, when B = and p(P) = 1. This problem has been studied in great detail. 
Relevant references include [13], [14], [15], [16], [17], [18], [19]. The rest of this paper deals entirely with 

the case p(P) < 1 and the term HDC subsumes the p(P) < 1 case, unless explicitly noted. When B = 0, 
the HDC (with p{P) < 1) leads to Xoo = 0, which is not very interesting. 

B. Anchors: B / 

This extends the average-consensus to "higher dimensions" (as will be explained in Section IV-C.) 
Lemma 1 establishes: (i) the conditions under which the HDC converges; (ii) the Umiting state of the 
network; and (iii) the rate of convergence of the HDC. 

Lemma 1 Let B 7^ 0. If 

pCP) < 1, (22) 

then the limiting state of the sensors, 

Xoo = lim X(i + 1) = (I-P)-IbU(O), (23) 

t— >oo 

and the error, E(t) = X(i) — Xoo, decays exponentially to with exponent ln{p(P)), i.e., 

limsup-ln||E(t)|| < ln(p(P)). (24) 

t^oo t 

Proof: From (17), we note that 

t 

X{t + 1) = P*+^X(0) + ^P'=BU(0), (25) 

k=0 

t 

^Xoo = lim P*+iX(0) + lim y P^BU(O), (26) 

t^oo t— »oo ' 
k=0 

and (23) follows from (22) and Lemma 9 in Appendix I. The error, E(t), is given by 

E(t) = X(t) - (I-P)-iBU(O), 

t—1 00 

= P*X(0) + ^P*^BU(0) -^P'^BUlO), 

k=0 

00 

= P* X(0) -^P*^BU(0) 



fc=o 
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To go from the second equation to the third, we recall (22) and use (112) from Lemma 9 in Appendix I. 
Let R = X(0) - EfeLoP^BU(O). To establish the convergence rate of ||E(t)||, we have 

^ln||E(t)|| = Jln||P*R||, 

< ^ln(||P*||||R||), 

< ln||P*f/* + ^ln||R||. (27) 



Now, letting t — oo on both sides, we get 



limsup -ln||E(t) II < limsupln||P*||"'^/* + limsup -ln||R||, 



(28) 



t— »oo t—*oo 



= In lim ||P*||^/*, (29) 

t—^0O 

= ln(p(P)). (30) 

and (24) follows. The interchange of lim and In is permissible because of the continuity of In and the 
last step follows from (6). ■ 
The above lemma shows that we require (22) for the HDC to converge. The limiting state of the 
network, Xoo, is given by (23) and the error norm, ||E(i)||, decays exponentially to zero with expo- 
nent ln(p(P)). We further note that the limit state of the sensors, Xqo, is independent of the sensors' 
initial conditions, i.e., the algorithm forgets the sensors' initial conditions and converges to (23) for 
any X(0) G M^^^™. It is also straightforward to show that if p(P) > 1, then the HDC algorithm (17) 
diverges for all U(0) G J\f{B), where M{B) is the null space of B. Clearly, the case U(0) G M{B) is 
not interesting as it leads to Xqo = 0. 

C. Consensus suhspace 
We now define the consensus subspace as follows. 

Definition 2 (Consensus subspace) Given the matrices, B G M-^^^ and P G M^^^, the consensus 
subspace, S, is defined as 

S = {Xoo I Xoo = (I - P)~' BU(0), U(0) G M-^^"*, p{V) < 1}. (31) 
The dimension of the consensus subspace, S, is established in the following theorem. 

Theorem 1 \f K < M and p{P) < 1, then the dimension of the consensus subspace, H, is 

dim(E;) = mrank(B) < mK. (32) 
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Proof: The proof follows From Lemma 1 and Lemma 10 in Appendix L ■ 
Now, we formally define the dimension of the HDC. 

Definition 3 (Dimension) The dimension of the HDC algorithm is the dimension of the consensus sub- 
space, S, normaUzed by m, i.e., 

dim(HDC) = = rank(B). (33) 

m 

This definition is natural because the HDC is a decoupled algorithm, i.e., HDC corresponds to m parallel 
algorithms, one for each column of X(t). So, the number of columns, m, in X(t) is factored out in the 
definition of dim(HDC). Each column of X(t) lies in a subspace that is spanned by exactly rank(B) 
basis vectors. The number of these basis vectors is upper bounded by the number of anchors, i.e., is at 
most K. 

D. Practical Applications of the HDC 

Several interesting problems can be framed in the context of HDC. We briefly sketch them below, for 
details, see [9], [11], [8], [10]. 

• Leader-follower algorithm [10]: When there is only one anchor, K = 1, the sensors' states converge 
to the anchor state. With multiple anchors {K > 1), under appropriate conditions, the sensors' states 
may be made to converge to a desired, pre-specified hnear combination of the anchors' states. 

• Sensor localization in m-dimensional Euclidean spaces, W^: In [8], we choose the elements of the 
matrices P = {pij} and B = {bij} so that the sensor states converge to their exact locations when 
only, K = m + l, anchors know their exact locations, for example, if equipped with a GPS. 

• Jacobi algorithm for solving linear system of equations, [11]: Linear systems of equations arise 
naturally in sensor networks, for example, power flow equations in power systems monitored by 
sensors or time synchronization algorithms in sensor networks. With appropriate choice of the 
matrices B and P, it can be shown that the HDC algorithm (17) is a distributed implementation of 
the Jacobi algorithm to solve the hnear system. 

• Distributed banded matrix inversion: Algorithm (17) followed by a non-linear collapse operator is 
employed in [9] to solve a banded matrix inversion problem, when the submatrices in the band are 
distributed among several sensors. This distributed inversion algorithm leads to distributed Kalman 
filters in sensor networks [20] using Gauss-Markov approximations by noting that the inverse of a 
Gauss-Markov covariance matrix is banded. 
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E. Robustness of the HDC 

Robustness is key in tlie context of HDC, when the information exchange is subject to communication 
noise, packet drops, and imprecise knowledge of system parameters. In the context of sensor localization, 
we propose a modification to HDC in [8] along the lines of the Robbins-Monro algorithm [21] where the 
iterations are performed with a decreasing step-size sequence that satisfies a persistence condition, i.e., the 
step-sizes converge to zero but not too fast (this condition is well studied in the stochastic approximation 
Hterature, [21], [22]). With such step-sizes, we show almost sure convergence of the sensor locaUzation 
algorithm to their exact locations under broad random phenomenon, see [8] for details. This modification 
is easily extended to the general class of HDC algorithms. 

V. Inverse Problem: Learning in Large- Scale Networks 

As we briefly mentioned before, the inverse problem learns the parameter matrices (B and P) of 
the HDC such that HDC converges to a desired pre-specified state (18). For convergence, we require 
the spectral radius constraint (22), and the matrices, B and P, to follow the underlying communication 
network, Q. In general, due to the spectral norm constraint and the sparseness (network) constraints, 
equation (18) may not be met with equality. So, it is natural to relax the learning problem. Using 
Lemma 1 and (18), we restate the learning problem as follows. 

Consider e G [0, 1). Given an AT-node sensor network with a communication graph, Q, and an M x if 
weight matrix, W, solve the optimization problem: 

inf II (I-P)-^B - W||, (34) 

B,P 

subject to: Spectral radius constraint, p (P) < e, (35) 

Sparsity constraint, Q'^ C ^, (36) 

for some induced matrix norm || • ||. By Lemma 1, if /o(P) < e, the convergence is exponential with 
exponent less than or equal to ln(e). Thus, we ask, given a pre-specified convergence rate, e, what is 
the minimum error between the converged estimates, lim^^oo and the desired estimates, WU(0). 

Formulating the problem in this way naturally lends itself to a trade-off between the performance and 
the convergence rate. 

In some cases, it may happen that the learning problem has an exact solution in the sense that there 
exist B, P, satisfying (35) and (36) such that the objective in (34) is 0. In case of multiple such solutions, 
we seek the one which corresponds to the fastest convergence, i.e., which leads to the smallest value 
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of p(P). We may still formulate a performance versus convergence rate trade-off, if faster convergence 
is desired. 

The learning problem stated as such is, in general, practically infeasible to solve because both (34) 
and (35) are non-convex in P. We now develop a more tractable framework for the learning problem in 
the following. 

A. Revisiting the spectral radius constraint (35) 

We work with a convex relaxation of the spectral radius constraint. Recall that the spectral radius can 
be expressed as (6). However, direct use of (6) as a constraint is, in general, not computationally feasible. 
Hence, instead of using the spectral radius constraint (35) we use a matrix induced norm constraint by 
realizing that 

P(P) < l|P||, (37) 
for any matrix induced norm. The induced norm constraint, thus, becomes 

||P||<£. (38) 
Clearly, (37) implies that any upper bound on ||P|| is also an upper bound on p(P). 

B. Revisiting the sparsity constraint (36) 

In this subsection, we rewrite the sparsity constraint (36) as a linear constraint in the design pa- 
rameters, B and P. The sparsity constraint ensures that the structure of the underlying communication 
network, Q, is not violated. To this aim, we introduce an auxiUary variable, F, defined as 

F ^ [B I P] G M^^^. (39) 

This auxiliary variable, F, combines the matrices B and P as they correspond to the adjacency ma- 
trix, A(^), of the given communication graph, Q, see the comments after (16). 

To translate the sparsity constraint into hnear constraints on F (and, thus, on B and P), we employ a 
two-step procedure: (i) First, we identify the elements in the adjacency matrix, A(^), that are zero; these 
elements correspond to the pairs of nodes in the network where we do not have a communication link, 
(ii) We then force the elements of F = [B | P] corresponding to zeros in the adjacency matrix, A(^), 
to be zero. Mathematically, (i) and (ii) can be described as follows. 

(i) Let the lower M x N submatrix of the AT x iV adjacency matrix, A = {aij} (this lower part 



April 12, 2009 



DRAFT 



13 



corresponds to F = [P | B] as can be noted from (16)), be denoted by A, i.e, 



A={aij} = {aij}, l = K+l,...,N, j = l,...N, i = l,...,M. 



(40) 



Let X contain all pairs for which a^j = 0. 

(ii) Let {ei}j=i,...,M be a family of 1 x M row-vectors such that has a 1 as the ith element and 
zeros everywhere else. Similarly, let {e-'}j=i,...,jv be a family of N xl, column-vectors such that has 
a 1 as the jth element and zeros everywhere else. With this notation, the ij -th element, fij, of F can be 
written as 

fij = e,Fe^'. (41) 
The sparsity constraint (36) is explicitly given by 



eiFe^=0, V(i,j)€x- 



(42) 



C. Feasible solutions 

Consider e G [0, 1). We now define a set of matrices, J^<£ C M^^^, that follow both the induced 
norm constraint (38) and the sparsity constraint (42) of the learning problem. The set of feasible solutions 
is given by 

J'<, = {F<, = [B I P] I e^Fe^' = 0, V G x, and ||FT|| < e}, (43) 



where 



r-j-i A_ 



OkxM 



t,NxM 



With the matrix T defined as above, we note that 



(44) 



P = FT. 



(45) 



Lemma 2 The set of feasible solutions, T<£, is convex. 



Proof: Let Fi, F2 € J^<e, then 



eiFie^ = 0, V(i,j)ex, 



eiF2e^ = 0, V(i,j)ex- 



(46) 



For any < /x < 1, and V G X' 



Bi (/xFi + (1 - /x)F2) eP = /xejFie^' + (1 - n)ixei¥2^ = 0. 
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Similarly, 



(/xFi + (1 - n)F2) T|| < /x||FiT|| + (1 - /x)||F2T|| < /x£ + (1 - n)e = e. 



The first inequality uses the triangle inequality for matrix induced norms and the second uses the fact 
that, for i = 1, 2, Fj G JT and ||FjT|| < e. 

Thus, Fi, F2 G J^<e /uFi + (1 — /i)F2 € ^<£. Hence, jr<£ is convex. ■ 
Similarly, we note that the sets, and J^<i, are also convex. 



D. Learning Problem: An upper bound on the objective 

In this section, we simplify the objective function (34) and give a tractable upper bound. We have the 
following proposition. 



Proposition 2 Under the norm constraint ||P|| < 1, then 

1 



(I-P)-^B- W 



< 



1 - ||P| 

Proof: We manipulate (34) to obtain successively. 



IB + PW- W| 



(47) 



(I-P)"^B- W 



(I-P)~VB- (I-P)W) 

< ll(I-P)-^ ||(B-(I-P)W)||, 

^P^ ||(B-(I-P)W)||, 
k 

< 5]||Pf ||(B-(I-P)W)||, 

k 

^ — IIB + PW- Wll . 



< 



1 



(48) 



To go from the second equation to the third, we use (112) from Lemma 9 in Appendix I. Lemma 9 is 
applicable here since (37) and given the norm constraint ||P|| < 1 imply p(P) < L The last step is the 
sum of a geometric series which converges given ||P|| < 1. ■ 
We now define the utihty function, u(B, P), that we minimize instead of minimizing II (I-P)-^B-W||. 
This is vahd because 'u(B,P) is an upper bound on (34) and hence minimizing the upper bound leads 
to a performance guarantee. The utility function is 

1 



«(B,P) = 



1 



IB + PW- W|| . 



(49) 
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With the help of the previous development, we now formally present the Learning Problem. 



Learning Problem: Given e G [0, 1), an A^-node sensor network with a sparse communication graph, Q, 
and a possibly full M x K weight matrix, W, design the matrices B and P (in (17)) that minimize (49), 
i.e., solve the optimization problem 

inf u(B,P). (50) 

[B I P]eJP-<e 



Note that the induced norm constraint (38) and the sparsity constraint (42) are implicit in (50), as they 
appear in the set of feasible solutions, J^<s- Furthermore, the optimization problem in (50) is equivalent 
to the following problem. 

inf u(B,P), (51) 

[B I P]e^<,n{||B||<6} 

where 6 > is a sufficiently large number. Since (51) involves the infimum of a continuous func- 
tion, w(B, P), over a compact set, .7>n{||B|| < 6}, the infimum is attainable and, hence, in the subsequent 
development, we replace the infimum in (50) by a minimum. 

We view the minti(B, P) as the minimization of its two factors, 1/(1 — ||P||) and ||B + PW — W||. 
In general, we need ||P|| ^ to minimize the first factor, 1/(1 — ||P||), and ||P|| —>■ 1 to minimize the 
second factor, ||B + PW — W|| (we explicitly prove this statement later.) Hence, these two objectives 
are conflicting. Since, the minimization of the non-convex utility function, u(B,P), contains minimizing 
two coupled convex objective functions, ||P|| and ||B + PW — W||, we formulate this minimization as 
a multi-objective optimization problem (MOP). In the MOP, we consider separately minimizing these 
two convex functions. We then couple the MOP solutions using the utility function. 



E. Solution to the Learning Problem: MOP formulation 

To solve the Learning Problem for every s G [0,1), we cast it in the context of a multi-objective 
optimization problem (MOP). We start by a rigorous definition of the MOP and later consider its 
equivalence to the Learning Problem. In the MOP formulation, we treat ||B -|- PW — W|| as the first 
objective function, /i, and ||P|| as the second objective function, /2. The objective vector, f(B,P), is 





/i(B,P) 




||B + PW - W|| 










(52) 




/2(B,P) 




||P|| 
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The multi-objective optimization problem (MOP) is given by 



[B I P]eJ^<i 



mm 



f(B,P), 



(53) 



where^ 



J=<^ = {F = [B I P] : e,Fe^ = 0, V G x, and ||FT|| < 1}. 



(54) 



Before providing one of the main results of this paper on the equivalence of MOP and the Learning 
Problem, we set the following notation. We define 



where the minimum of an empty set is taken to be +oo. In other words, ^exact the minimum value 
of /2 = ||P|| at which we may achieve an exact solution^ of the Learning Problem. A necessary condition 
for the existence of an exact solution is studied in Appendix n. If the exact solution is infeasible (^ ^<i), 
then £exact = min{0}, which we defined to be +oo. We let 



The Learning Problem is interesting if e € We now study the relationship between the MOP and 
the Learning Problem (50). Recall the notion of Pareto-optimal solutions of an MOP as discussed in 
Section II-B. We have the following theorem. 

Theorem 3 Let B£,P£, be an optimal solution of the Learning Problem, where £ & £. Then, B^jP^ is 
a Pareto-optimal solution of the MOP (53). 

The proof relies on analytical properties of the MOP (discussed in Section VI) and is deferred until 
Section VI-C. We discuss here the consequences of Theorem 3. Theorem 3 says that the optimal solutions 
to the Learning Problem can be obtained from the Pareto-optimal solutions of the MOP. In particular, 
it suffices to generate the Pareto front (collection of Pareto-optimal solutions of the MOP) for the MOP 
and seek the solutions to the Learning Problem from the Pareto front. The subsequent Section is devoted 
to constructing the Pareto front for the MOP and studying the properties of the Pareto front. 

^Although the Learning Problem is valid only when ||P|| < 1, the MOP is defined at ||P|| = 1. Hence, we consider ||P|| < 1 
when we seek the MOP solutions. 

^An exact solution is given by [B | P] € such that (I — P)~^B = Wor when the infimum in (34) is attainable and is 0. 
April 12, 2009 DRAFT 



eexact = min{||P|| | (I - P)-1B = W, [B |P] G J-<i} 



(55) 



f = [0,l)n[0,£exact]• 



(56) 
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VI. Multi-objective Optimization: Pareto Front 

We consider the MOP (53) as an e-constraint problem, denoted by Pfc(e) [12]. For a two-objective 
optimization, n = 2, we denote the £-constraint problem as -Pi (£2) or P2{ei), where -Pi (£2) is given by"* 

min /i(B,P) subject to /2(B,P)<£2- (57) 
[B I P]eT<i 

and P2{£i) is given by 

min /2(B,P) subject to /i(B,P)<£i. (58) 

[B I P]e.F<i 

In both Pi(£2) and p2(£i), we are minimizing a real-valued convex function, subject to a constraint 
on the real-valued convex function over a convex feasible set. Hence, either optimization can be solved 
using a convex program [23]. We can now write £exact i^ terms of -P2(£i) as 



^exact — 



P2(0), if there exists a solution to P2(0), 

(59) 

-l-oo, otherwise. 



Using Pi (£2), we find the Pareto-optimal set of solutions of the MOP. We explore this in Section VI-A. 
The collection of the values of the functions, /i and /2, at the Pareto-optimal solutions forms the Pareto 
front (formally defined in Section VI-B). We explore properties of the Pareto front, in the context of our 
learning problem, in Section VI-B. These properties will be useful in addressing the miiumization in (50) 
for solving the Learning Problem. 

A. Pareto-Optimal Solutions 

In general, obtaining Pareto-optimal solutions requires iteratively solving e-constraint problems [12], 
but we will show that the optimization problem. Pi (£2), results directly into a Pareto-optimal solution. 
To do this, we provide Lemma 3 and its Corollary 1 in the following. Based on these, we then state the 
Pareto-optimahty of the solutions of Pi (£2) in Theorem 4. 

Lemma 3 Let 

[Bo I Po] = argminp | p]6jr^^Pi(£o). (60) 

''AU the inflma can be replaced by minima in a similar way as justified in Section V-D. Further note that, for technical 
convenience, we use .F<i and not J^<i, which is permissible because the MOP objectives are defined for all values of ||P||. 



April 12, 2009 



DRAFT 



18 

If £o € ^> then the minimum of the optimization, Pi(£o), is attained at £o, i.e., 

/2(Bo,Po) = £o. (61) 

Proof: Let the minimum value of the objective, /i, be denoted by 5q, i.e., 

5o = /i(Bo,Po). (62) 

We prove this by contradiction. Assume, on the contrary, that ||Po|| = e' < eo. Define 

"0 = 7- (63) 

Since, e' < £o < we have < ao < 1- For < a < 1, define another pair, Bi, Pi, as 

Bi^aBo, Pi ^ (l-a)I + aPo. (64) 

Clearly, this choice is feasible, as it does not violate the sparsity constraints of the problem and further 
hes in the constraint of the optimization in (60), since 

||Pi II < (1 - a) + as' < 1 - Q!(l - £') < 1 - «o(l - £') = £o- (65) 

With the matrices Bi, Pi in (64), we have the following value, 5i, of the objective function, /i, 

5i = ||Bi+PiW-W||, 

= ||aBo + ((l-a)I + aPo)W-W||, 

= IjaBo + aPoW - aW|| , 

= a/i(Bo,Po) =a<5o. (66) 

Since, a <1 and non-negative, we have 5i < Sq. This shows that the new pair, Bi, Pi, constructed from 
the pair, Bq, Pq, results in a lower value of the objective function. Hence, the pair, Bq, Pq, with ||Po|| = 
e' < £o is not optimal, which is a contradiction. Hence, /2(Bo, Po) = eo,. ■ 

Lemma 3 shows that if a pair of matrices, Bq, Pq, solves the optimization problem Pi(eo) with eo £ 
then the pair of matrices, Bo,Po, meets the constraint on /2 with equality, i.e., /2(Bo,Po) = eo- The 
following corollary follows from Lemma 3. 
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Corollary 1 Let £q G £, and 



Then, 



for any e < eq, where 



[Bo I Po] = argminp | p]e^<^-Pi(eo), (67) 
5o = /i(Bo,Po). (68) 

5o<5e, (69) 



[Be I Pe] = argminp | p]ejr<,A(£), (70) 
5e = /i(B„P,). (71) 

Proof: Clearly, from Lemma 3 there does not exist any e < Eq that results in a lower value of the 
objective function, /i. ■ 
The above lemma shows that the optimal value of /i obtained by solving Pi{e) is strictly greater than 
the optimal value of /i obtained by solving P\{eq) for any e < eq. 

The following theorem now establishes the Pareto-optimality of the solutions of Pi{e). 

Theorem 4 'ii eq ^ £, then the solution Bo,Po, of the optimization problem, Pi{eq), is Pareto optimal. 

Proof: Since, Bq, Pq solves the optimization problem, Pi (eg), we have ||Po|l = £q, from Lemma 3. 
Assume, on the contrary that Bo,Po, are not Pareto-optimal. Then, by definition of Pareto-optimality, 
there exists a feasible B , P, with 

/i(B,P) < /i(Bo,Po), (72) 
/2(B,P) < /2(Bo,Po), (73) 

with strict inequality in at least one of the above equations. Clearly, if /2(B, P) < /2(Bo, Pq), then ||P|| < 
eo and B, P, are feasible for Pi(eo). By Corollary 1, we have /i(B, P) > /i(Bo, Po). Hence, /2(B, P) < 
/2(Bo,Po) is not possible. 

On the other hand, if /i(B,P) < /i(Bo,Po) then we contradict the fact that Bo,Po, are optimal 
for Pi{eq), since by (73), B,P, is also feasible for Pi{£o). 

Thus, in either way, we have a contradiction and Bo,Po are Pareto-optimal. ■ 
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B. Properties of the Pareto Front 

In this section, we formally introduce the Pareto front and explore some of its properties in the context 
of the Learning Problem. The Pareto front and their properties are essential for the minimization of the 
utility function, m(B,P) over (50), as introduced in Section V-D. 

Let £ denote the closure of £. The Pareto front is defined as follows. 

Definition 4 [Pareto front] Consider e G Let B^, Pg, be a solution of Pi{e) then^ e = f2{^e->^e)- 
Denote by 5 = fii^e^'Pe). The collection of all such (e, 6) is defined as the Pareto front. 

For a given e ^ £, define 5{e) to be the minimum of the objective function, /i, in Pi{e). By 
Theorem 4, (£,(5(£)) is a point on the Pareto front. We now view the Pareto front as a function, 5 : 
£ I — > M+, which maps every e ^ £ io the corresponding 5{e). In the following development, we use 
the Pareto front, as defined in Definition 4, and the function, 5, interchangeably. The following lemmas 
establish properties of the Pareto front. 

Lemma 4 The Pareto front is strictly decreasing. 

Proof: The proof follows from Corollary 1. ■ 

Lemma 5 The Pareto front is convex, continuous, and, its left and right derivatives^ exist at each point 
on the Pareto front. Also, when Sexact = +oo, we have 

5{1) = lim 6{e) = 0. (74) 

Proof: Let e = /2(-) be the horizontal axis of the Pareto front, and let d{e) = /i(-) be the 
vertical axis. By definition of the Pareto front, for each pair {£,6{£)) on the Pareto front, there exists 
matrices B£,P£ such that 

||Pe|| = £, and \\Be + PeW -W\\=6{e). (75) 

Let (£i,(5(ei)) and £2,^(62) be two points on the Pareto front, such that £1 < £2- Then, there 
exists Bi,Pi, and B2,P2, such that 

||Pi|| = £i, and ||Bi + PiW-W|| = (5(£i), (76) 
IIP2II = £2, and IIB2 + P2W-WII =(^(£2). (77) 

^This follows from Lemma 3. Also, note that since Be, Pe, is a solution of -Pi(e), B^, Pe, is Pareto optimal from Theorem 4. 
* At e = 0, only the right derivative is defined and at e = sup £, only the left derivative is defined. 
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For some < /x < 1, define 

Bs = mBi + (1 - /x)B2, (78) 
Ps = ^iPl + {1 - fi)P2. (79) 

Clearly, [B3 | P3] G T<i as the sparsity constraint is not violated and 

IIP3II <m||Pi|| + (i-/x)||P2|| < 1, (80) 

since ||Pi|| < 1 and IIP2II < 1- Let 

£3 = llPsll, (81) 

and let 

^(£3) = IIB3 + P3W- W||. (82) 

We have 

Z(£3) = ||/xBi + (l-/i)B2 + (/xPl + (l-M)P2)W- W||, 

= ll/xBi + /xPiW - /xW + (1 - /x)B2 + (1 - m)P2W - (1 - /x)W||, 
< mI|Bi + PiW - W|| + (1 - /x)||B2 + P2W - W||, 

= ^^S{el) + {l-^i)S{e2). (83) 

Since (£3, z{e3)) may not be Pareto-optimal, there exists a Pareto optimal point, (£3, 6{ss)), at £3 (from 
Lemma 3) and we have 

^(£3) < z{e3), 

< /x(5(£i) + (l-/x)(5(£2). (84) 

From (80), we have 

£3 < At£i + (1 - Af)£2, (85) 
and since the Pareto front is strictly decreasing (from Lemma 4), we have 

(5(/X£i + (l-Ai)£2)<<5(£3). (86) 
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From (86) and (84), we have 

Sinei + (1 - ii)s2) < fiS{ei) + (1 - /x)(5(£2), (87) 

which estabhshes convexity of the Pareto front. Since, the Pareto front is convex, it is continuous, and it 
has left and right derivatives [24]. 

Clearly, 5{1) = lim£_^.i (5(e) by continuity of the Pareto front. By choosing P = I and B = 0, we 
have (5(1) = 0. Note that (1,0) lies on the Pareto front when £exact = +oo. Indeed, for any B,P 
satisfying the sparsity constraints, we simultaneously cannot have 

||P|| < 1, (88) 
or ||B + PW-W|| < 0, (89) 

with strict inequality in at least one of the above equations. Thus, the pair B = 0, P = I is Pareto-optimal 
leading to (5(1) = 0. ■ 

C. Proof of Theorem 3 

With the Pareto-optimal solutions of MOP established in Section VI-A and the properties of the Pareto 
front in Section VI-B, we now prove Theorem 3. 

Proof: We prove the theorem by contradiction. Let UPeH = e' < e, and S' = HB^ + P^W — W||. 
Assume, on the contrary, that B^, P^ is not Pareto-optimal. From Lemma 3, there exists a Pareto-optimal 
solution B*, P*, at e', such that 

||P* II = e', and S{e') = \\B* + P* W - W|| , (90) 

with S{£') < 6', since B^jP^, is not Pareto-optimal. Since, ||Pe|| =£'<£, the Pareto-optimal 
solution, B*,P*, is feasible for the Learning Problem. In this case, the utility function for the Pareto- 
optimal solution, B*,P*, is 

u{B*,P*) = ^— jp;^ ||B* +P*W- W|| , (91) 
- (92) 



1 - £ 



< (93) 



— t 

= u{B„P,). (94) 
Hence, B^, P^ is not an optimal solution of the Learning Problem, which is a contradiction. Hence, B^, P^ 
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is Pareto-optimal. ■ 
The above theorem suggests that it suffices to find the optimal solutions of the Learning Problem from 
the set of Pareto-optimal solutions, i.e., the Pareto front. The next section addresses the minimization of 
the utility function, ?/(B,P), and formulates the performance-convergence rate trade-offs. 

VII. Minimization of the utility function 

In this section, we develop the solution of the Learning Problem from the Pareto front. The solution 
of the Learning Problem (50) lies on the Pareto front as already established in Theorem 3. Hence, it 
suffices to choose a Pareto-optimal solution from the Pareto front that minimizes (50) under the given 
constraints. In the following, we study properties of the utility function. 

A. Properties of the utility function 

With the help of Theorem 3, we now restrict the utility function to the Pareto-optimal solutions^. By 
Lemma 3, for every e ^ £, there exists a Pareto-optimal solution, B^jP^, with 

||Pe||=£, and ||B^ + PeW- W|| = (5(e). (95) 

Also, we note that, for any Pareto-optimal solution, B,P, the corresponding utility function. 

This permits us to redefine the utility function as, u* : £ i — > M+, such that, for any Pareto-optimal 
solution, B,P, 

u{B,P) = u*{\\P\\) (97) 
We establish properties of u*, which enable determining the solutions of the Learning Problem. 
Lemma 6 The function u*{e), for e e £, is non-increasing, i.e., for £i, £2 £ £ with £1 < £2, we have 

u*{e2) < u*{ei). (98) 

Hence, 

min u(B,P) = u*(£). (99) 
[B I P]eT<e 

^Note that when Eexact = +00, the solution B = 0, P = I is Pareto-optimal, but the utility function is undefined here, 
although the MOP is well-defined. Hence, for the utility function, we consider only the Pareto-optimal solutions with ||P|| in £. 
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Proof: Consider £1,62 € ^ such that £1 < 82, then £2 is a convex combination of £1 and 1, i.e., 
there exists a < /x < 1 such that 

£2 =//£! + (1 -m). (100) 

From Lemma 3, there exist 5{e-\) and (^(£2) on the Pareto front corresponding to £1 and £2, respectively. 
Since the Pareto front is convex (from Lemma 5), we have 



(101) 



Recall that 5{1) = ; we have 



and (98) follows. 
We now have 



< 



l-£2' 



1 - //£l - (1 - //) ' 



(102) 



min u(B,P) 

[B I P]eT, 



min ;i(B,P), 
[B I P]e:Fj and (b,p) is Pareto-optimal 

min m(B,P), 
||P||<£ and (B,p) is Pareto-optimal 

min u*(e'), 

0<£'<£ ^ ^ 



(103) 



The first step follows from Theorem 3. The second step is just a restatement since the sparsity constraints 
are included in the MOP. The third step follows from the definition of u* and finally, we use the non- 
increasing property of u* to get the last equation. ■ 
We now study the cost of the utility function. From Lemma 6, we note that this cost is non-increasing 

as e increases. When £exact < 1» this cost is 0. When £exact = +00^ we may be able to decrease the 
cost as £ — > 1. We now define the limiting cost. 
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Definition 5 [Infimum cost] The infimum cost, Cmf, of the utihty function is defined as 

lime^i u* (sr) , if £exact = +00 , 



Cinf = < 



(104) 

0, otherwise. 



Clearly, the cost does not increase as e ^ 1 from Lemma 6. If £exact = +00, it is not possible for the 
utility function, u*{s), to attain Cinf, since n*(e) is undefined at ||P|| = 1. We note that when eexact = 
+00, the utility function can have a value as close as desired to Cinf , but it cannot attain Cinf . The following 
lemma establishes the cost of the utility function, u*{e), as e ^ 1. 

Lemma 7 If eexact = +00, then the infimum cost, Qnf, is the negative of the left derivative, D~{6{e)), 
of the Pareto front evaluated at e = 1. 

Proof: Recall that (5(1) = 0. Then Qnf is given by 

Cinf = limM*(e), 

e-*l 



e^ll -e' 

£-+1 1 - £ 

= -D-{S{s))U=i. (105) 



B. Graphical Representation of the Analytical Results 

In this section, we graphically view the analytical results developed earher. To this aim, we estabhsh 
a graphical procedure using the following lemma. 

Lemma 8 Let {e, S{e)) be a point on the Pareto front and g{£) a straight fine that passes through (e, S{e)) 
and (1,0). The cost associated to the Pareto-optimal solution(s) corresponding to (e, (5(e)) is both the 
(negative) slope and the intercept (on the vertical axis) of 5(e). 

Proof: We define the straight line, ^(e), as 

5(e) = cie + C2, (106) 
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(a) 
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Fig. 1. (a) Graphical illustration of Lemma 8. (b) Illustration of case (i) in performance-speed tradeoff, (c) Illustration of case 
(ii) in performance-speed tradeoff. 



where ci is its slope and C2 is its intercept on the vertical axis. Since g{e) passes through (e,(5(e)) 
and (1,0), its slope, ci, is given by 

6{e) - 



ci 



e- 1 



(107) 



Since g{e) passes through (1,0), at e = 1 we have 

C2 = [9{e) - cie]e=i = g{l) - ci = u*{e). 



(108) 



Figure 1(a) illustrates Lemma 8, graphically. Let (e*, 6*) be a point on the Pareto front. The cost, c*, of 
the utility function, n*(e*), is the intercept of the straight line passing through {e*,6*) and (1,0). 



C. Performance-Speed Tradeoff: Sexact = +oo 

In this case, no matter how large we choose ||P||, the HDC does not converge to the exact solution. 
By Lemma 1, the convergence rate of the HDC depends on p(P) and thus upper bounding ||P|| leads 
to a guarantee on the convergence rate. Also, from Lemma 6, the utility function is non-increasing as 
we increase ||P||. We formulate the Learning Problem as a performance-speed tradeoff. From the Pareto 
front and the constant cost straight lines, we can address the following two questions, 
(i) Given a pre-specified performance, Co (the cost of the utility function), choose a Pareto-optimal 
solution that results into the fastest convergence of the HDC to achieve Cq. We carry out this 
procedure by drawing a straight line that passes the points (0,Co) and (1,0) in the Pareto plane. 
Then, we pick the Pareto-optimal solution from the Pareto front that lies on this straight line and 
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also has the smallest value of ||P||. See Figure 1(b). 
(ii) Given a pre-specified convergence speed, ea, of the HDC algorithm, choose a Pareto-optimal solution 
that results into the smallest cost of the utility function, ti(B,P). We carry out this procedure by 
choosing the Pareto-optimal solution, (cq , Sa), from the Pareto front. The cost of the utility function 
for this solution is then the intercept (on the vertical axis) of the constant cost hne that passes 
through both {£a,^a) and (1,0). See Figure 1(c). 
We now characterize the steady state error. Let Bo,Po, be the operating point of the HDC obtained 
from either of the two tradeoff scenarios described above. Then, the steady state error in the limiting 
state, Xoo, of the network when the HDC with Bq, Pq is implemented, is given by 

e,, = ||(I-Po)-X-W||, (109) 

which is clearly bounded above by (49). 

D. Exact Solution: Sexact < 1 

In this case, the optimal operating point of the HDC algorithm is the Pareto-optimal solution corre- 
sponding to (eexact'O) on the Pareto front. A typical Pareto front in this case is shown in Figure 2, 
labeled as Case I. A special case is when the sparsity pattern of B is the same as the sparsity of the 
weight matrix, W. We can then choose 

B = W, P = 0, (110) 

as the solution to the Learning Problem and the Pareto front is a single point (0, 0) shown as Case II in 
Figure 2. 

If it is desirable to operate the HDC algorithm at a faster speed than corresponding to £exact' we can 
consider the performance-speed tradeoff in Section VII-C to get the appropriate operating point. 

VIII. Conclusions 

In this paper, we present a framework for the design and analysis of linear distributed algorithms. 
We present the analysis problem in the context of Higher Dimensional Consensus (HDC) algorithms 
that contains the average-consensus as a special case. We establish the convergence conditions, the 
convergent state and the convergence rate of the HDC. We also define the consensus subspace and 
derive its dimensions and relate them to the number of anchors in the network. We present the inverse 
problem of deriving the parameters of the HDC to converge to a given state as learning in large-scale 
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networks. We show that the solution of this learning problem is a Pareto-optimal solution of a multi- 
objective optimization problem (MOP). We explicitly prove the Pareto-optimality of the MOP solutions. 
We then prove that the Pareto front (collections of the Pareto-optimal solutions) is convex and strictly 
decreasing. Using these properties of the MOP solutions, we solve the learning problem and also formulate 
performance-speed tradeoffs. 

Appendix I 
Important Results 



Lemma 9 If a matrix P is such that 



then 



P(P) < 1, 



lim P*+i = 0, 

t — >-oo 



t^OQ ^ ^ 



(111) 
(112) 



fe=0 



Proof: The proof is straightforward. 



Lemma 10 Let rq be the rank of the M x M matrix (I — P) ^, and re the rank of the M x n matrix B, then 



rank(I — P) < min(rQ,rB), 

rank(I-P)"iB > rq + re - M. 



(113) 
(114) 



Proof: The proof is available on pages 95 — 96 in [25]. 
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Appendix II 
Necessary Condition 

Below, we provide a necessary condition required for the existence of an exact solution of the Learning Problem. 

Lemma 11 Let p{P) < 1, K < M, and let rw denote the rank of a matrix W. A necessary condition for 
(I-P)-iB = Wtohold is 

rB = rw (115) 

Proof: Note that the matrix I — P is invertible since p(P) < 1. Let Q = (I — P)~^, then tq = M. From 
Lermna 10 in Appendix I and since by hypothesis K < M, 

rank(QB) < rs, (116) 
rank(QB) > M + rB-M = rB. (117) 

The condition (115) now follows, since from (34), we also have 

rank(QB)=rw (118) 
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